The equation of state and optimal value function used to achieve the optimal strategy is figured out through the analysis of conditional probability of the process 通过条件概率分析,计算出了动态规划状态转移方程和最优期望代价方程,并得到了关联规则发现的决策策略。
2.
Reinforcement learning algorithms that use cerebellar model articulation controller ( cmac ) are studied to estimate the optimal value function of markov decision processes ( mdps ) with continuous states and discrete actions . the state discretization for mdps using sarsa - learning algorithms based on cmac networks and direct gradient rules is analyzed . two new coding methods for cmac neural networks are proposed so that the learning efficiency of cmac - based direct gradient learning algorithms can be improved 在求解离散行为空间markov决策过程( mdp )最优策略的增强学习算法研究方面,研究了小脑模型关节控制器( cmac )在mdp行为值函数逼近中的应用,分析了基于cmac的直接梯度算法对mdp状态空间离散化的特点,研究了两种改进的cmac编码结构,即:非邻接重叠编码和变尺度编码,以提高直接梯度学习算法的收敛速度和泛化性能。